Comparative Study between First and All-Author Co-Citation Analysis Based on Citation Indexes Generated from XML Data

نویسندگان

Jesper Wiborg Schneider

Birger Larsen

Peter Ingwersen

چکیده

The study presents a comparative analysis between first and all-author co-citation analyses, as well as comparison between two matrix generation approaches. We thus continue the latest research in author co-citation analysis (ACA), where the results of the traditional first-author analyses based on ISI citation indexes are challenged by incorporating all-authors from the cited references. Identifying all cited authors from references in source papers is an extremely cumbersome process if the Thomson ISI citation indexes are used as a basis. Due to the difficulty in obtaining all-author co-citation data few such studies exist. In order to study all-authors co-citation we use a citation index generated from documents in XML code. This allows us to carry out a comparative study between first and all-author co-citation analyses based on the hitherto largest set of references and the broadest domain of research. Introduction Author co-citation analysis (ACA), introduced by White and Griffith (1981), is a technique for mapping the ‘intellectual structure’ of a research field, where the latter is defined as a coherent literature set. The intellectual structure is mapped from the oeuvres of the most cited and co-cited first authors in a particular literature set. Since its introduction, ACA has become a popular and much used technique. However, recently a debate concerning methodical procedures in ACA has emerged. Especially, the approach to ACA developed at Drexel University (e.g., White & Griffith, 1981; McCain, 1990) has been the focus of the current debate. Essentially, four methodical issues have been debated: 1) scalability (e.g., Chen, 1999), 2) units of analysis and their definition (e.g., Persson, 2001; Zhao, 2006; Rousseau & Zuccala, 2004), 3) the choice of proximity measures (e.g., Ahlgren, Jarneving, & Rousseau, 2003; Schneider & Borlund, 2007a; 2007b), and most recently 4) generation and transformation of matrices (Leydesdorff & Vaughan, 2006; Schneider & Borlund, 2007a). The present paper addresses the second and fourth issues in a comparative study of first and all-author cocitation analysis based on different matrix generation approaches in structured XML documents that allow for the construction of ad-hoc citation indexes. The paper is structured as follows. The following section discusses briefly previous research on all-author co-citation analyses and matrix generation. The proceeding section describes the research method of the study, i.e., data collection and data analysis. The next section presents and discusses the results, and the contribution ends with a conclusion. Previous Work on All-author Co-citations and matrix generation In several respects, the methodical approach to ACA developed at Drexel University has been shaped by specific technical features that have seemingly brought some constraints to the ACA methodology. Most important is the dependence upon the standardized cited reference strings in Thompson ISI’s citation indexes, and the use of the SPSS statistical package as the tool for multivariate analyses. The most obvious example is that the cited reference strings only allows for first authors as units of analysis in ACA. As a result, ACA methodology only takes into account first authors in the definition of author co-citation counts. Two authors are considered to be co-cited when at least one document from each author’s oeuvre occurs in the same reference list of a citing document, where an author’s oeuvre is defined as all the works with the author as the first author (McCain, 1990). This definition has rarely been challenged. Persson (2001) is the first empirical study that compares the potential difference in intellectual structure between mappings done by first-author and all-author co-citation analyses. The study is based on 7001 source documents from library and information science journals in the CD-ROM version of Social Science Citation Index 1986-1996. The study investigates how these source documents have been co-cited with each other within the dataset by use of multidimensional scaling (MDS). The co-citations for source documents amount to some 7% of the total number of references in the dataset; the remaining 93% go to non-source documents not indexed by the Thompson ISI citation indexes. The study demonstrates that first-author ACA leaves out several influential researchers compared to all-author ACA, although the subfield structure tends to be just about the same for both methods. The study is somewhat limited due to the dependence on a limited set of source documents, the sparse details provided concerning the definition and calculation of co-citations, and finally the informal evaluation procedures. Nevertheless, the results are indicative as they are somewhat confirmed in a smaller study done by Zhao (2006). All-author vs. First-author Co-citation Analyses Zhao (2006) is the hitherto must detailed theoretical and empirical investigation of all-author cocitation analysis, including a definition of co-citation counts reminiscent of the definitions given earlier by Rousseau and Zuccala (2004). The study defines three different counting methods: firstauthor co-citation; inclusive all-author co-citation; and exclusive all-author co-citation. Likewise, as a consequence of all-author co-citation analysis, the study redefines “...an author’s oeuvre as all works with this author as one of the authors of each of the works.” (Zhao, 2006, p. 1580). The distinction between inclusive and exclusive all-author co-citations refers to the immediate implication of the above definition of all-author co-citation counting of author’s oeuvres, as two authors may also be considered as being co-cited when a paper that the two authors co-authored is cited. Thus, coauthorships when cited can also be counted into co-citations. This means that inclusive all-author cocitation analysis counts cited co-authorships, whereas exclusive all-author co-citation analysis does not. Typically author co-citations and co-authorships are treated as different units of analysis, where the former is used to map intellectual structures and the latter to investigate research collaboration. Rousseau and Zuccala (2004), in their definition, suggest that such an approach supports the view that authors, regardless of their overall authorship ranking, can contribute substantially to the development of a research area, and that it presents a more accurate portrayal of an individual author’s contribution to a research area where high rates of co-authorship are prevalent. Besides the novel definition of all-author co-citation counting, Zhao (2006) adheres to a traditional Drexel-approach to ACA (see below). The dataset was rather small: it consisted of 312 publications in PDF on the subject of XML identified using CiteSeer. The 312 publications contained 4578 citations, which was used a basis for the co-citation analysis. The results of the study indicate that all-author cocitation counting creates more coherent groups of authors, which supposedly should be considerably clearer to identify and interpret. Nevertheless, due to the straightforward application of citation thresholds for including cited authors in the study, the results also show that all-author co-citation count can lead to identification of fewer specialties in a research field compared to first-author cocitation counting – that is when the same number of top-ranked authors is selected and analyzed (Zhao, 2006). Zhao (2006) undoubtedly contributes considerably to our understanding of all-author co-citation analysis. However, for the time being, the results of the empirical study must be treated carefully until we have more substantial evidence that may or may not support its findings. The motivation for the present paper is therefore to continue the work of Zhao (2006) by further investigating inclusive allauthor co-citation analysis in order to bring about deeper empirical understanding and evidence concerning this novel counting approach. The present study is the first in a series that addresses the research possibilities inherent in a citation index based on source documents formatted in XML. One such possibility is all-author co-citation analysis, and the present study is based on the hitherto largest set of citing documents applied in an all-author co-citation analysis. Co-citation Matrix Generation Most recently the role played by matrices in co-citation analyses has received attention. Leydesdorff and Vaughan (2006) demonstrate the fundamental difference between asymmetric data matrices (n × m) and symmetric proximity matrices (n × n), arguing that symmetric matrices of co-occurrence counts are per se proximity matrices and should be treated as such. 1 http://citeseer.ist.psu.edu/ In the Drexel-approach to ACA, first author co-citation counts are obtained by online retrieval. Subsequently the co-citation counts are entered into a symmetric proximity matrix. However, the desire to apply factor analysis to ACA as a more detailed exploratory tool in order to identify latent structures and thus help interpret the mapping results, necessitates a symmetric proximity matrix of covariance or correlation coefficients. In traditional multivariate analyses such proximity matrices are derived from an asymmetric data matrix of variables by cases. However, such a matrix is not available in the Drexel-approach due to the paired online counting. As a result an unorthodox procedure is devised, where the proximity matrix of co-citation counts are transformed into an additional proximity matrix of derived correlation coefficients of first author co-citation profiles. Note that a linear transformation of a symmetric proximity matrix is not straightforward. A theoretical problem arises, as all relations in a symmetric matrix occur twice which evidently leads to a magnification. Further, the transformation also causes a fundamental problem in relation to the interpretation and treatment of diagonal values. In SPSS several possibilities for treating diagonal values are available, and the most commonly used in ACA is to treat the diagonals as missing data (e.g., McCain, 1990). Hence, rows are treated as cases and columns as variables, yet this procedure is only allowable when computing correlation matrices. The same practice is not applicable in SPSS if one wishes to transform the proximity matrix of co-citation counts into a similarity or distance measure. Missing data beyond doubt causes some loss of information in the matrix and therefore likely influence the ensuing ordination or clustering results. White (2003, p. 1250), nevertheless, asserts that the treatment of diagonal values is a minor problem. This may be true, depending on the data set at hand, but the generic problem arises due to the unorthodox matrix generation (transformation) approach and could be avoided if a more traditional procedure was applied. The question is whether the different approaches to matrix generation make a significant difference in the interpretation and mapping of ACA? Consequently, the present study explores two of the recently debated methodical issues of ACA, first-author versus all-author co-citation analysis, and the influence of different matrix generation approaches upon mapping results. The major research questions explored in the study are: • To what extend does a different data set support the previous findings of first-author versus all-author co-citation analysis? • To what extend does different matrix generation techniques influence the interpretation and mapping of author co-citation data? In order to answer these questions, we perform two first-author and two inclusive all-author cocitation analyses, one pair commencing from a data matrix and one pair commencing from a proximity matrix of co-citation counts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The analysis of co-citation and word co-occurrence networks of Iranian articles in the field of dentistry

Background and Aims: Dentistry is an important profession ensuring the health of body and soul, and has a special place in the scientific productions of medical disciplines. The purpose of this study was to analyze the co-citation and word co-occurrence of Iranian research papers in the field of dentistry based on indexed documents in Web of Science from 2014 to 2018. Materials and Methods:...

متن کامل

Drawing Co-Citation Networks of Corona Virus Studies

Background and Aim: The purpose of the present study is to map the coronavirus domain citation network to better understand this domain based on all other citation networks. Materials and Methods: The present study is applied in terms of purpose, and is descriptive scientometrics in terms of type, which has been done with the all-citation method. In this study, all scientific publications on ...

متن کامل

Citation analysis of graduate Dental thesis references: Before and after an intervention

Background: Introduction of Iranian National Medical Digital Library (INLM) was a huge investment during several years ago. The aim of this study was to discover the effectiveness of this scientific intervention by examination of citation pattern among graduate dental thesis during before and after of INLM accessibility. Methods: This analytical study was conducted among all of graduate dental ...

متن کامل

بررسی میزان همبستگی خوداستنادی و شاخص ‌آنی مجله‌های علمی- پژوهشی انگلیسی زبان ایرانی حوزه پزشکی نمایه شده در نمایه نامه استنادی اسکوپوس در فاصله سال های 2009-2005

Background and Aim: Self-citation, as one of the limitations of citation analysis, unusually affects the ranking of journals. This study aims to evaluate the degree of relationship between self-citation and immediacy index correlation of Iranian medical journals indexed in Scopus Citation Index between 2005 and 2009. Materials and Methods: The method of the study is survey-descriptive in which...

متن کامل

Towards all-author co-citation analysis

* A short version of this paper was published in Proceedings of ASIS&T 2005 Annual Conference. † Email: [email protected], Phone: 1-780-4922814, Fax: 1-780-4922430 The present study examines one of the fundamental aspects of author co-citation analysis (ACA) – the way cocitation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Comparative Study between First and All-Author Co-Citation Analysis Based on Citation Indexes Generated from XML Data

نویسندگان

چکیده

منابع مشابه

The analysis of co-citation and word co-occurrence networks of Iranian articles in the field of dentistry

Drawing Co-Citation Networks of Corona Virus Studies

Citation analysis of graduate Dental thesis references: Before and after an intervention

بررسی میزان همبستگی خوداستنادی و شاخص ‌آنی مجله‌های علمی- پژوهشی انگلیسی زبان ایرانی حوزه پزشکی نمایه شده در نمایه نامه استنادی اسکوپوس در فاصله سال های 2009-2005

Towards all-author co-citation analysis

عنوان ژورنال:

اشتراک گذاری